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Abstract 

Circular data are data measured in angles and occur in a variety of scientific 
disciplines. Bayesian methods promise to allow for flexible analysis of circu¬ 
lar data. Three existing MCMC methods (Gibbs, Metropolis-Bastings, and 
Rejection) for a single group of circular data were extended to be used in a 
between-subjects design, providing a novel procedure to compare groups of 
circular data. Investigating the performance of the methods by simulation 
study, all methods were found to overestimate the concentration parameter 
of the posterior, while coverage was reasonable. The rejection sampler per¬ 
formed best. In future research, the MCMC method may be extended to 
include covariates, or a within-subjects design. 
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1. Introduction 

Circular data are data measured in angles or orientations in two-dimensional 
space. For example, one may imagine directions on a compass (0° — 360°), 
times of the day (0 — 24 hours), or directions on a circumplex model, such 
as Leary’s Circle (Leary, 1957). Circular data are frequently encountered 
in many scientific disciplines, such as biology, social sciences, meteorology, 
astronomy, earth sciences, and medicine. 
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The analysis of circular data requires special directional statistical meth¬ 
ods due to the periodicity of the sample space. For example, two angles of 
10° and 350° differ by only 20°, while if treated linearly the distance between 
them would seem to be 340°. A similar mismatch occurs for the arithmetic 
mean of 10° and 350°, which is 180°, while their correct circular mean is 0°. 

Three different approaches for analysis of circular data are discussed in 
the literature: the intrinsic approach, which uses the von Mises distribution 
(Von Mises, 1918; Damien and Walker, 1999); the embedding approach, which 
employs the Projected Normal distribution (Nunez-Antonio and Gutierrez- 
Pena, 2005); and the wrapping approach, where distributions on the real line 
are wrapped around the circle (Ferrari, 2009). The intrinsic approach is the 
most prominent in the literature, perhaps because this is currently the only 
approach which allows calculation of maximum likelihood estimates (Ferrari, 
2009). Additionally, mapping the circular sample space to a sample space 
in either (wrapping) or (embedding) generally leads to an increase in 
the amount of parameters to be estimated, which may make these methods 
more complex. Because of these reasons, the scope is limited to the intrinsic 
approach here. 

Due to the difficulty of working with a circular sample space, few methods 
have been developed in the held of analysis of circular data. An overview 
of available frequentist methods for analysis of circular data can be found 
in Fisher (1995) and Mardia and Jupp (1999). Bayesian methods offer a 
promising new approach not only in the held of statistics at large, but also 
specihcally in the analysis of circular data. Main advantages of the Bayesian 
approach are the hexibility of Markov chain Monte Carlo (MCMC) meth¬ 
ods used in Bayesian analysis, the lack of asymptotic assumptions, and the 
possibility to incorporate knowledge from previous research. Some work has 
been done performing Bayesian estimation on circular data without utilising 
MCMC methods (Dowe et al., 1996), but such methods only perform point 
estimation without providing standard errors, while researchers are often in¬ 
terested in drawing inference. 

In the case of directional statistics, MCMC methods may prove to be a 
hexible solution to the difficulty of drawing inference from circular data. A 
limited number of MCMC methods for circular data have been developed. 
Available methods generally employ the von Mises distribution, which is the 
natural analogue of the normal distribution on the circle. Early work by 
Damien and Walker (1999) provided a Gibbs sampler for a single group by 
adding latent variables to the model. Metropolis-Bastings algorithms have 
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been developed for circular distributions in general (Bhattacliarya and Sen- 
gupta, 2009) and for the von Mises-Fisher distribution, which is the gen¬ 
eralization of the von Mises distribution to the sphere (Nunez-Antonio and 
Gutierrez-Pena, 2005). Recent work has attempted to tune the parameters 
of a rejection sampling algorithm in order to obtain a computationally fast 
method to sample from the posterior of a von Mises distribution (Forbes and 
Mardia, 2014). Although different in approach, these methods have in com¬ 
mon that they draw from the posterior of the von Mises distribution given 
one group of circular data, which can be used to describe properties of a 
single sample. None of the methods may be used to compare groups. 

In this paper existing MCMC methods will be extended to analyse data 
from between-subjects designs, where the research goal is to compare mean 
directions of multiple groups on a circular outcome. Many tests in between- 
subjects designs, such as ANOVA, assume equal variance across groups. Cir¬ 
cular ANOVA methods that have been developed in a frequentist framework 
also carry this assumption (Harrison and Kanji, 1988; Harrison et ah, 1986). 
A main aim of this paper is thus to extend available MCMC methods to 
between-subjects designs, so that the method samples multiple mean direc¬ 
tions and a single measure of dispersion. Then, the performance of these 
methods will be assessed to decide which is the most commendable. 

Section 2 provides the theoretical framework and notation for the von 
Mises distribution. Then, in Section 3, three MCMC methods for between- 
subjects designs are discussed. These are compared by means of a simulation 
study in Section 4. Concluding remarks are given in Section 5. 

2. The intrinsic approach 

The MCMC methods discussed in this paper all fall within the intrinsic 
approach, where it is assumed that the data follow the von Mises distribution. 
This section will discuss basic properties of the von Mises distribution and 
provide a framework for the MCMC methods that will be discussed in Section 

3. The first four sections will be restricted to the von Mises distribution for 
a single group, while Section 2.5 will introduce properties and notation to be 
used in the case with multiple groups. 
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2.1. Von Mists distribution 

The von Mises distribution is a symmetric uniniodal distribution, which 
is given by 

VM(6*|/i, k) = {27tIo{k)}~^ exp{K cos(6' — fi)}, 0 < 6 < 27r, k > 0 

where 6 represents the data, /i represents the mean direction, k is the con¬ 
centration parameter, and Jo(-) is the modified Bessel function of order 0 
(Abramowitz and Stegun, 1972). A higher k represents less variation, and 
thus more concentrated data. Let 6 = (6*i,..., 6*n) be a sample of angular 
measurements 9i of size n. 

Each angle in the dataset may be viewed as a vector of length 1 in direc¬ 
tion 9i. As illustrated in Figure 1, the summation of these vectors results in 
a vector in direction 9 of length R. 9 is an unbiased estimator of /r, while R 
is called the resultant length and may be obtained from 


R 



+ 



2 


1 


which increases with concentration and sample size. In the von Mises model, 
i? is a sufficient statistic for k. The mean resultant length can be computed 
as R = R/n, which is a metric of concentration independent of the sample 
size. 


2.2. Prior distribution 

Guttorp and Lockhart (1988) present a conjugate prior for the von Mises 
distribution. It is given up to a constant of proportionality by 

p(/i, k) oc Jo(K)~'^exp{i?oKcos(/i — /io)}, 

which represents c observations with prior mean direction po and prior re¬ 
sultant length Rq. In all methods applied in this paper, this conjugate prior 
will be used. 
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Figure 1: Illustration of the mean direction and resultant length of 6 = {56°, 77°, 344°}. 
The summation of the vectors results in a vector of length R in direction 9. 


2.3. Posterior distribution 

To obtain the posterior distribution, the data and the prior are combined 
to obtain the posterior mean and the posterior resultant length by 


n 

Cn = Ro COS /io + ^ COS 9i, 
i=l 


n 

Sn = Rq sin Po + ^ sin 9i, 

i=l 


and 


/^n 


tan-^{Sn/Cn) 
tSiia-^{Sn/Cn) + TT 
tan"^(S'„/C'n) + 2 % 


if > 0, > 0 

if < 0 

if > 0, S'n < 0 


R„ = VCi+M. 


Then, the joint posterior distribution is given up to a constant of propor¬ 
tionality, by 


/(/i, fi:|0) oc {Jo(k)} "*exp{i?„«:cos(/i-/i„)}, 

where m = n + c. This distribution is not of closed form due to the Bessel 
function. 


^In R (R Development Core Team, 2015), calculation of is readily available in 
atan2(S'„, Cn) ■ 
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Conditional distributions 

The MCMC methods presented in Section 3 are based upon the condi¬ 
tional posterior distributions /(/i|K, 0) and /(fi:|/i, 0). The conditional pos¬ 
terior distribution of p, up to a constant of proportionality, is given by 


/(p|k, 0 ) oc exp{i?,^Kcos(/i -/in)}, 


which is the kernel of a von Mises distribution with mean direction /in and 
concentration parameter Rn^. Several straightforward methods to sample 
data from the von Mises distribution are available. (Best and Fisher, 1979; 
Fisher, 1995) 

The conditional distribution of /(«:|/i,0), is given by 


/(fi:|/i,0) oc {Jo(k)} "*exp{i?nKcos(/i-/in)}. 


However, it is not straightforward to sample from this conditional distribu¬ 
tion, so that special methods are required. In Section 3, three methods that 
can sample the concentration parameter will be discussed. 

2.5. Notation for multiple groups 

Here, basic notation and properties will be defined that will be used to 
extend the methods discussed in Section 3 to multiple groups. Denote the 
groups by j = 1,..., J. Then, for group j, the posterior mean is denoted by 
/inj and the posterior resultant length by Rnj. The sample size of group j is 
denoted by n^, which will be combined with the prior property Cj to obtain 
ruj = Hj + Cj. Finally, let 


j 


,/ 



Utilising this notation, the posterior for multiple groups with a common 
K is given by 



where = (/xi,... ,/ij) denotes the mean directions of the groups. 
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3. MCMC Sampling 


In this section, Bayesian methods will be discussed that are able to sample 
from the posterior of a von Mises distribution in a between-subjects design 
with J > 1 independent groups with common but unknown n. Specifically, 
three novel MCMC methods will be presented: a Gibbs sampler based on 
work by Damien and Walker (1999), a Metropolis-Bastings sampler, and a 
rejection sampler based on work by Forbes and Mardia (2014). Importantly, 
all three methods employ the conjugate prior as described in Section 2.2. 

3.1. A Gibbs sampler using latent variables 

In one of the earliest attempts at sampling the concentration parameter 
of the von Mises distribution, Damien and Walker (1999) provide a Gibbs 
sampler that only requires sampling of uniform random variates. It is an 
application of the procedure of adding latent variables to a posterior distri¬ 
bution in order to be able to apply Gibbs samplers in situations where this 
may not have been feasible originally (Damien et ah, 1999). 

Although the relative simplicity of the Gibbs sampler usually is appealing, 
it has been noted that this sampler shows high autocorrelation for more 
concentrated data, causing slow convergence (Nunez-Antonio and Gutierrez- 
Pena, 2005, p. 990). 

Damien and Walker (1999) add latent variables w, v, x, and u = {ui, U 2 ,...) 
to the joint posterior density /(/x, k| 0), where u is an infinite set of latent 
variables. It is not necessary to sample an infinite number of values for 
as computing values for Uk up to some sufficient k provides a good approx¬ 
imation of the correct solution. Let Z be the number of values of Uk that 
will be sampled, so that the set of sampled values is ui,... ,uz. For each 
analysis performed with this method, a value for Z must be chosen. This is 
a disadvantage of this method, because setting Z too high will prove com¬ 
putationally intensive, while setting Z too low produces biased results. 

Another disadvantage is that this method requires setting starting values 
not only for /i and k, but also for w. However, w does not have an intuitive 
interpretation, making the choice of a starting value somewhat arbitrary and 
possibly difficult. 
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3.1.1. Sampler for a single group 

The posterior density of a single group, after inclusion of the latent vari¬ 
ables, is given up to a constant of proportionality as 



for which the marginal for (p, k) is /(/i, k\6), as required. The Gibbs sampler 
works by drawing a value from the conditional distributions of x,v, ft,Uk,w 
and K in sequential order, each conditional on the current other values. Fur¬ 
ther details, including the required conditional distributions, are found in 
Damien and Walker (1999) and will not be given here, as the Gibbs sampler 
for a single group is a special case of the Gibbs sampler described next with 
J = 1. 

3.1.2. Sampler for multiple groups 

This section will describe the adapted procedure to implement the Gibbs 
sampler for multiple groups, so that it will sample from the posterior density 
f{^i,K,w,v,u,x\6). It differs in two ways from the sampler provided in 
Damien and Walker (1999): first, means for multiple groups and a common 
K are now sampled, and second, some steps were combined or simplified to 
facilitate implementation. Notably, the sampling of a set of values ui,... ,uz 
is rewritten to sample another set of values Ni,..., Nz- The extended Gibbs 
sampler consists of the following 8 steps: 

1. Set n, K, and w to their starting values. 

2. Draw a random variate r from 17(0,1). 

3. For each group j, draw a value for fij from U{pinj — cos~^ g, finj + 
cos~^ g), where 


Rnj {1 COs(/ij - pLnj)] 

72 ^ 


In r 


g = max —1, — -h 


4. Galculate M = w + E., where w is the current value of w and E is 
a random variate drawn from an exponential distribution with rate 


/o(k) - 1. 
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5. Draw a new value for w from < w < M), where r is a 

uniform random variate from 17(0,1). 

6. Compute = k( 1 + where is an exponential r.v. with 

rate w{k\)~‘^{0.5KY^, and k = 1,... ,Z. Set N = minA^fc. For advice 
on setting Z, see Section 3.1.3. 

7. Draw a value for k from e“'^"'^/(max{0, Un} < k < N), where 

Inr 

Vn = - j -h K. 

^nj {1 + COs{fij - finj)} 

8. Repeat steps 2-7 until a sufficient number of samples have been ob¬ 
tained. 

3.1.3. Choosing Z 

In step 6 of the procedure given above to draw from the conditional 
density of k, a number of samples are generated of which the smallest is 
retained. However, the number of Nk that should be sampled (here denoted 
by Z) was not discussed in Damien and Walker (1999). A small simulation 
study was performed to be able to give guidelines for setting Z when applying 
this algorithm. 

For each combination of sample sizes {5, 30,100} and concentrations {0.1,1,4, 8,16, 32}, 
100 datasets with J = 1 and J = 3 were generated. The Gibbs sampler was 
then run for 10000 iterations with no lag and a burn-in of 500 on each dataset, 
with Z set to 40. In each iteration, the index number k of the selected (small¬ 
est) value for was saved. This resulted in 100 chains (one for each dataset) 
of chosen index numbers k of 10000 iterations. Then, the overall maximum 
value of these chains was taken. If in all these iterations the chosen value 
never exceeds some number, setting Z to that number or slightly above it 
will ensure that Z is not too low to produce bias while still retaining some 
computationally efficiency. 

From the results, given in Table 1, it is apparent that a value for Z of 
about 20 should be sufficient for the current study. The required Z decreases 
with higher sample sizes and less concentrated data. It is recommended to 
investigate sensitivity on Z before application. 

3.2. A Metropolis-Hastings sampler 

Another approach is to employ the Metropolis-Hastings (MH) method 
(Metropolis et ah, 1953; Hastings, 1970) to sample from the posterior of a 
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Table 1: Maximum k that was picked out as the smallest value after 10000 iterations of 
the Gibbs sampler applied to 100 datasets for different sample sizes (n), concentration (k) 
and number of groups (J). 


K 


J= 1 



CO 


n = 10 

n = 30 

n = 100 

n = 10 

n = 30 

n = 100 

0.1 

7 

4 

3 

5 

4 

3 

1 

13 

7 

7 

6 

5 

6 

4 

17 

11 

8 

10 

9 

7 

8 

17 

13 

9 

10 

9 

7 

16 

19 

14 

9 

13 

9 

9 

32 

19 

12 

9 

12 

9 
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von Mises distribution. MH algorithms are often slower and encounter more 
autocorrelation and convergence problems than Gibbs samplers, as Gibbs 
sampling can be seen as a special case of the MH algorithm. However, consid¬ 
ering the complicated nature of adding latent variables in the Gibbs sampler 
described above, an MH method may be advantageous. Another advantage 
is that the algorithm is reasonably straightforward. On the other hand, this 
method depends on a proper choice for the proposal density, which may limit 
its use. 

3.2.1. Sampler for a single group 

To apply the sampler for a single group, samples are needed for a single 
/i and K. The conditional distribution of the mean direction /i is known and 
is easy to sample from using a Gibbs step. The conditional distribution of k 
is known but difficult to sample from, which will be solved by applying an 
MH step. 

For the MH step, two main ingredients are required; the posterior from 
which samples are required, and a proposal density from which it is straight¬ 
forward to sample. The conditional posterior /(k|/U, 9 ) is given in Section 2.4. 
As a non-negative proposal density, the y^-distribution will be used. More 
complex and flexible proposal densities, such as the Gamma distribution, 
may provide benefits, but for the sake of simplicity only the y^-distribution 
will be considered here. The full algorithm will not be presented here as it 
is a special case of the sampler described next with J = 1. 
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3.2.2. Sampler for multiple groups 

The MH sampler for multiple groups may employ the posterior /(^t, k\6) 
as given in Section 2.5. However, in order to prevent underflow issues, the 
natural logarithm of the posterior is used, which is 

n 

Infill, k\6) oc -mt In [Io{k)] + k ^ Rnj cos{fij - p.nj)- 

i=l 

Let Kcur he the current value of k, and x^{x\h) be the chi-square distribution 
with h degrees of freedom. Then, the MH method is given by the following 
7 steps: 

1. Set Kcur to its starting value. 

2. For each group j, draw a value p,j from VM(/Xj|/i„j, RnUcur)- 

3. Draw a candidate Kean from {^^can\f^cur)■ 

4. Calculate the MH ratio as 

a= \nf{Kcan\t^,0) +^^X^{>^curWcan) 
blI111 X {,^can\^cur') 1 

where pL = {/ii,... ,/nj}, a row vector of current values of fi for each 
group. 

5. Draw a value u from U{0, 1). 

6. If a > Inu, set Kcur = i^can- Elsewise, remain at Kcur- 

7. Repeat step 2-6 until a sufficient number of samples have been ob¬ 
tained. 

3.3. A rejection sampler 

In a recent paper, Forbes and Mardia (2014) present a promising new 
algorithm to sample from the conditional posterior /(«:|/x, 0). The approach 
is largely focused on computational speed, and was motivated by the fact 
that plugging a Bessel function approximation into the von Mises posterior 
leads to a Gamma distribution. 

The approach performs rejection sampling for k on the basis of two param¬ 
eters, {? 7 , /So}- For a single group of data with no prior, as described in Forbes 
and Mardia (2014), the algorithm sets rj = n and /Sq = —n~^ cos(6'j —/r). 
These are then used to compute the approximately optimal parameters for a 
Gamma proposal, such that the probability of rejection is minimized. In the 
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rejection step, a candidate for k is then repeatedly drawn from this Gamma 
proposal density until it is accepted. 

Samples of fi are drawn outside of the algorithm, which may be done 
easily as in step 2 of the MH procedure in Section 3.2.2. As with the MH 
method, only k requires a starting value. 

3.3.1. Sampler for a single group 

Forbes and Mardia (2014) describe the rejection sampler for a single group 
of data using a constant prior. The conjugate prior that is preferred here can 
be added as described below. 

Using the sample mean direction 9, it can be shown that 

/n \ Rcos{fi-e) 

Po = -n y cos(6'i - /i) = - . 

i=l 

This relation to the resultant length means that fin,Rni and m from the 
desired posterior can be plugged into the formula for /3o, to obtain 

^ Rn cos(/i - Hn) 

Pn 

m 

Then, the rejection algorithm can be applied exactly as given in Forbes and 
Mardia (2014), using instead of /do and rj = m. 

3.3.2. Sampler for multiple groups 

As the sampling of mean directions occurs outside of the main algorithm, 
it is straightforward to sample separate mean directions for each group. How¬ 
ever, the common k depends on the sampled means through fin- After com¬ 
putation of fin, the rejection algorithm no longer uses the data 0 or the 
current value of pi. The sampler will thus be extended to multiple groups by 
once again rewriting /3„. 

Using Rnj and p.nj, and as before, let 

^ _ E/.l Rni COs(/i l^nj') 

Pt = • 

mt 

Then, the rejection algorithm can be applied using fit instead of flo and 

p = mt. 
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Gibbs 


MH 


Rejection 




Figure 2: Example chains of fj, (in degrees) and k drawn in the first 500 iterations of each 
of the three methods, with no burn-in and without thinning the chain, where J = 3, true 
K = 0.1, and rij = 30. 

4. Simulation study 

In the previous section, three distinct methods to sample from the pos¬ 
terior of the von Mises distribution with multiple groups were shown. In 
this section, these three methods will be evaluated on their performance and 
efficiency. 

4.I. Methods 

All three methods were implemented in C++ within R (R Development 
Core Team, 2015) via Repp (Eddelbuettel and Frangois, 2011). To illustrate 
the differences between the three methods. Figure 2 shows example chains 
of the first 500 iterations for each of the three methods. It can be seen that 
the Gibbs sampler has large autocorrelation and slow convergence, that the 
MH algorithm can have low acceptance probability but converges fast, and 
that the rejection algorithm converges fast and mixes well. 
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The three sampling methods were applied to various scenarios, which 
differed in the following three properties. First, the samplers analyzed both 
a single group of data (J = 1) and three groups of data (J = 3). Second, 
sample sizes of 10, 30 and 100 were used. For J = 3, this sample size denotes 
the sample size per group (rij), making the total sample size Srij. Third, 
values for the concentration parameter k were 0.1, 4 and 32. Because the 
multiple groups are assumed have equal k, all three groups of data were 
sampled given the same true k. These manipulations resulted in a 3x2x3x3 
simulation study design, for a total of 54 cells. For J = 1, the true mean was 
set at 20°, while true means for J = 3 were set at 20°, 40°, and 60°. 

For each cell, 2000 datasets were generated, each of which was analyzed 
with each sampler. Burn-in and lag (that is, how much the chain will be 
thinned) were set to appropriate values (see Section 4.2), after which the 
first 10000 retained iterations of both and k were saved. Although all 
three methods allow inclusion of prior information through the conjugate 
prior, a non-informative prior was used throughout the simulation study by 
setting fiQ = 0,Ro = 0, and c = 0. Each method requires a starting value 
for K, which was set to 2 in all cases. The Gibbs sampler required additional 
starting values for /i and w, which were set at 0 and 4 respectively, regardless 
of sample size or k. Additionally, for the Gibbs sampler an appropriate Z 
must be chosen (see Section 3.1.3), which was set to 25 throughout this study. 

4-2. Convergence 

As convergence is achieved at a different number of iterations for each of 
the methods, several runs of each were assessed for each cell in order to assess 
convergence and required burn-in and lag, which were then set correctly for 
the simulation study. Burn-in was set, in all cases, to 500 times the chosen 
value for the lag. 

The Gibbs sampler performed adequately for small samples with large 
dispersion. For example, a single group of 10 datapoints with true k = 0.1 
produced a reasonable sample from the posterior using a lag of 2, which means 
saving every other iteration. With larger sample sizes and more concentrated 
data, the autocorrelation increases quickly, requiring a lag of 250 for k = 4 
and Uj = 100 with J = 3. For values of k above about 7, application of the 
Gibbs sampler becomes unfeasible, so results for the Gibbs sampling method 
with true k = 32 are not reported. 

The MH algorithm fared much better, converging quickly in all tested 
situations. However, applying MH methods requires reasonable acceptance 
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rates, which can be computed by QaocjQi where Qacc is the number of ac¬ 
cepted iterations, and Q is the total number of iterations. Johnson and 
Albert (1999) suggest an acceptance rate of about 50% to be ideal. A low 
acceptance rate may suggest a badly fitting proposal density, while a high 
acceptance rate (i.e. close to 1) may suggest that the algorithm has yet 
to converge properly. As convergence was assessed seperately and achieved 
quite quickly, only too low acceptance rates were of concern here. Accep¬ 
tance rates increased for smaller sample sizes and more concentrated data. 
For example, for a single group of data with n = 100 and k = 0.1 the average 
acceptance rate was as low as 0.1, while for n = 10, k = 32 the acceptance 
rate was as high as .81. 

The rejection algorithm converged almost immediately and showed barely 
any autocorrelation. An acceptance rate can be computed by Q/Qcan, where 
Q is the total number of accepted candidates, which was chosen beforehand as 
the desired number of iterations, and Qcan is the total number of candidates, 
including those that were accepted. The algorithm rejected no more than 
15% of the candidates in any case. 

4-3. Mode estimation for K 

Estimating k as the mean or the median of the posterior sample may lead 
to biased results, as k is non-negative and has a right-skewed distribution. For 
skewed distributions, the mode usually provides the least biased estimate. An 
estimate of the mode can be obtained by using the Highest Density Interval 
(HDI), which is the shortest interval containing a certain percentage of the 
data (Venter, 1967). Here, the mode was estimated to be the midpoint of 
the 10% HDI. 

4-4- Results 

In Tables 2 and 3, results are displayed for a single group and three 
groups, respectively. As mentioned before, applying the Gibbs sampler to a 
situation with k = 32 is unfeasible, and therefore these rows are left empty. 

The column below posterior /i mean gives the average of the posterior 
mean direction of either /i or {/xi,/i 2 ,/xs} of all replications. The coverage of 
the mean denotes the proportion of replications where 95% Central Credible 
Interval (CCI) contained the true fi. For J = 3, this coverage was averaged 
over the three means. The desired value of the coverage is .95. For the 
posterior k, the estimated mode for each replication was saved, as well as the 
95% HDI. The average of the mode over replications is provided in the column 
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Table 2: Average of posterior properties over 2000 replications for J = 1, with true /i = 20°, for 
different sample sizes (n) and concentration k. 


n 

K, 

Method 

Posterior /i 
Mean Goverage 

Posterior 

Mode Goverage 

Acc} 

MGT" 

10 

0.1 

Gibbs 

15.52 

0.74 

0.34 

0.98 

1 

0.74 



MH 

15.72 

0.76 

0.34 

0.96 

0.24 

0.02 



Rejection 

15.70 

0.75 

0.36 

0.97 

0.91 

0.02 


4 

Gibbs 

20.26 

0.97 

4.73 

0.93 

1 

9.60 



MH 

20.26 

0.96 

5.03 

0.94 

0.53 

0.02 



Rejection 

20.26 

0.96 

4.90 

0.96 

1 

0.02 


32 

Gibbs 

— 

— 

— 

— 

— 

— 



MH 

19.91 

0.92 

40.94 

0.95 

0.81 

0.02 



Rejection 

19.91 

0.92 

41.42 

0.95 

1 

0.02 

30 

0.1 

Gibbs 

22.35 

0.81 

0.19 

0.98 

1 

1.08 



MH 

22.03 

0.81 

0.18 

0.96 

0.16 

0.02 



Rejection 

22.15 

0.80 

0.20 

0.97 

0.89 

0.02 


4 

Gibbs 

19.99 

0.97 

4.20 

0.94 

1 

11.50 



MH 

19.99 

0.97 

4.27 

0.96 

0.36 

0.02 



Rejection 

19.99 

0.97 

4.18 

0.97 

1 

0.02 


32 

Gibbs 

— 

— 

— 

— 

— 

— 



MH 

19.98 

0.94 

34.43 

0.95 

0.70 

0.02 



Rejection 

19.98 

0.94 

34.40 

0.95 

1 

0.02 

100 

0.1 

Gibbs 

20.82 

0.86 

0.11 

0.99 

1 

3.58 



MH 

20.97 

0.87 

0.11 

0.98 

0.10 

0.02 



Rejection 

20.91 

0.86 

0.12 

0.98 

0.86 

0.02 


4 

Gibbs 

19.99 

0.96 

4.06 

0.94 

1 

57.59 



MH 

19.99 

0.95 

4.08 

0.94 

0.21 

0.02 



Rejection 

19.99 

0.96 

4.02 

0.96 

1 

0.02 


32 

Gibbs 

— 

— 

— 

— 

— 

— 



MH 

20.03 

0.95 

32.72 

0.96 

0.53 

0.02 



Rejection 

20.03 

0.95 

32.72 

0.96 

1 

0.02 


“Posterior n mode denotes the mode as described in section 4.3. Coverage denotes the 
proportion of replications in which the true k fell within the 95 % HDI. 

^Acceptance ratio. For Gibbs sampling, this is always 1. For Metropolis-Hastings, QacclQ 
is given. For the rejection method, this is QjQcan- 

“Mean Computation Time of one replication in seconds. 


16 







posterior k mode, followed by the posterior k coverage, which denotes the 
proportion of replications for which the true value fell within the 95% HDI. 
The last two columns provide the acceptance rate and the mean computation 
time (MCT) per replication in seconds. 

The size of the bias tends to depend on the true value. In order to 
investigate by what factor estimates are off, a relative bias can be calculated 
as Bias/True value. The relative bias may help facilitate interpretation of 
relative severity of bias for k, in order to allow for more accurate comparisons 
between cells. 

4.4-T a single group 

Posterior ja. All three methods provided similar results for the posterior 
mean, which was generally close to the true mean. Estimates were closer to 
the true value for increasing n and increasing k. The worst case was found for 
K = 0.1, n = 10, where the difference between the true fi (20°) and average 
posterior /r was about 4.3°. However, this difference can almost surely be 
attributed to sampling error of datasets instead of an issue with the MCMC 
methods. When k = 0.1, the distribution of the sample mean direction 6 is 
close to the circular uniform distribution, so that the average over the sample 
mean directions, even over 2000 datasets, shows some random variation. This 
is supported by the fact that the MCMC methods all show the same difference 
from the true value. In general, there seems to be no systematic bias in the 
estimation of the mean direction. 

Coverage was generally adequate as well. Undercoverage was observed for 
K = 0.1 with all sample sizes, although the coverage improved with increasing 
sample size. Coverage for /i relies on correct procedures of sampling both /i 
and K. For example, an upwards bias in k results in a lower coverage for 
/i. Sampling methods for /r are straightforward, and thus deviations of the 
coverage from .95 are likely due to a dehcient mechanism to sample k, as the 
current value of k is used in the distribution of /i. 

Posterior k. The mode of k shows a systematic upward bias for all cells and 
all methods. The relative bias is worse for smaller k and thus more dispersed 
data. The bias also decreases with increasing n and nearly disappears for 
n = 100. This bias coincides with a well-known bias in maximum likeli¬ 
hood estimation of u (Mardia and Jupp, 1999, p. 87). For suggestions on 
corrections to obtain unbiased estimates, see Best and Fisher (1981). 

Regardless of the observed bias, coverage for k was generally acceptable. 
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Figure 3: Coverages of k for different sample sizes in) and concentration (k), all with 
J = 3. The solid straight line indicates the target of the coverages, .95. 

fluctuating around .95 for all methods. In conditions with low concentration, 
a tendency towards overcoverage (coverage above .95) can be seen. 

Mean Computation Time. The hnal column denotes the computational time 
for the algorithms as implemented in C++ via Repp, which was averaged 
over all replications. The Gibbs sampling method performed worst by far. 
Its computational time increased with higher sample sizes and higher concen¬ 
tration. The longest reported time was 57.59 seconds per replication. Both 
the MH and rejection algorithm were very fast. Their computational time 
was fairly independent of both sample size and k, so that it always took 
about 0.02 seconds for a replication. 

4-4■2. Multiple groups 

In Table 3, results for analysis of multiple groups of data are shown. For 
the posterior group mean directions the observed pattern was similar to the 
single group case. The Gibbs sampler showed slight overcoverage for /i when 
K = 0.1, while the MH and rejection method show slight undercoverage with 
Uj = 10, K = 0.1. In all other cells, coverage of p, was adequate. 

As with J = 1, a systematic upward bias was observed in k for the MH 
and rejection sampler. The bias was generally smaller for J = 3 compared to 
J = 1, because the total amount of observations is three times as large. The 
strongest bias was observed for Uj = 10. For nj = 30 and Uj = 100 much less 
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Table 3: Average of posterior properties over 2000 replications for J = 3, with true means 
= 20°, ^2 = 40°, /i 3 = 60°, for different samples sizes per group (rij) and concentration (k). 


n, 

K, 

Method 

Ml 

Posterior p. 

M2 M3 

Coverage 

Posterior 

Mode Coverage 

Acc.'' 

MCT“ 

10 

0.1 

Gibbs 

21.55 

42.64 

64.59 

0.95 

0.07 

1 

1 

3.32 



MH 

21.99 

43.10 

65.30 

0.92 

0.23 

0.97 

0.17 

0.05 



Rejection 

22.17 

43.11 

65.07 

0.91 

0.26 

0.98 

0.91 

0.05 


4 

Gibbs 

20.01 

40.11 

60.02 

0.97 

4.06 

0.94 

1 

7.47 



MH 

19.99 

40.14 

60.01 

0.96 

4.35 

0.95 

0.36 

0.05 



Rejection 

19.99 

40.13 

60.02 

0.96 

4.26 

0.96 

1 

0.05 


32 

Gibbs 

— 

— 

— 

— 

-- 

— 

— 

— 



MH 

19.98 

39.93 

60.10 

0.94 

34.51 

0.95 

0.70 

0.05 



Rejection 

19.98 

39.93 

60.10 

0.94 

34.55 

0.96 

1 

0.05 

30 

0.1 

Gibbs 

15.16 

39.28 

51.20 

0.97 

0.04 

1 

1 

1.30 



MH 

15.42 

40.69 

50.34 

0.94 

0.14 

0.98 

0.11 

0.05 



Rejection 

15.68 

40.68 

50.36 

0.94 

0.15 

0.98 

0.88 

0.05 


4 

Gibbs 

19.97 

40.09 

59.86 

0.96 

3.99 

0.93 

1 

8.66 



MH 

19.95 

40.09 

59.83 

0.96 

4.07 

0.95 

0.22 

0.05 



Rejection 

19.96 

40.10 

59.83 

0.96 

4.01 

0.96 

1 

0.05 


32 

Gibbs 

— 

— 

■- 

— 

-- 

— 

— 

— 



MH 

19.94 

40.03 

60.02 

0.94 

32.84 

0.95 

0.55 

0.05 



Rejection 

19.94 

40.03 

60.02 

0.95 

32.79 

0.95 

1 

0.05 

100 

0.1 

Gibbs 

18.00 

39.06 

58.53 

0.97 

0.04 

1 

1 

4.77 



MH 

18.08 

39.41 

58.42 

0.95 

0.10 

0.97 

0.07 

0.05 



Rejection 

18.30 

39.49 

58.57 

0.95 

0.10 

0.98 

0.86 

0.05 


4 

Gibbs 

19.98 

39.98 

60.10 

0.95 

4.01 

0.93 

1 

31.21 



MH 

19.96 

39.98 

60.10 

0.95 

4.04 

0.94 

0.13 

0.05 



Rejection 

19.96 

39.98 

60.10 

0.95 

3.98 

0.96 

1 

0.05 


32 

Gibbs 

— 

— 

— 

— 

— 

— 

— 

— 



MH 

19.98 

40 

60.03 

0.95 

32.24 

0.95 

0.36 

0.05 



Rejection 

19.98 

40 

60.03 

0.95 

32.21 

0.95 

1 

0.05 


“Posterior k mode denotes the mode as described in section 4.3. Coverage denotes the 
proportion of replications in which the true n fell within the 95 % HDI. 

^Acceptance ratio. For Gibbs sampling, this is always 1. For Metropolis-Hastings, QacclQ 
is given. For the rejection method, this is QIQcan- 

“Mean Computation Time of one replication in seconds. 
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bias was observed. The Gibbs sampler, however, shows a downward bias in 
these cases. Fignre 3 shows the coverages of k per method for different sample 
sizes and concentration, with J = 3. Coverages for k were mostly adequate, 
although the Gibbs sampler performed badly, with severe overcoverage with 
K = .1. The MH and rejection sampler performed well, although the rejection 
method seemed slightly more prone to overcoverage. These coverages in the 
range .95-1 indicate that the HDI would be chosen too wide so that the true 
value falls within the HDI more often than expected. Finally, computational 
time increased slightly with three groups for MH and rejection. 

5. Discussion 

This paper presented three different MCMC approaches for Bayesian es¬ 
timation of the mean directions fij of multiple groups of circular data with 
common but unknown concentration k. These approaches were based on 
existing knowledge on Bayesian analysis of circular data that could be used 
for analysis of a single group of circular data. Additionally, a systematic 
investigation of the performance of the three approaches was performed. 

Gomparing the methods, clear differences became apparent. The Gibbs 
sampler encountered many problems, among which were undesirable cover¬ 
ages, sizable computational time, and complexity in application. The MH 
method performed adequately, but it does not show desirable acceptance 
rates for large datasets with small concentration when using the current 
proposal density. The rejection algorithm by Forbes and Mardia (2014) was 
found to be the most promising of the MGMG-methods available in the lit¬ 
erature at present, due to fast computational speed, fast convergence and 
adequate coverage. 

The model developed here is still limited in terms of scope; it provides 
a basic between-subjects design for multiple groups of circular data, but 
extensions of this model such as a between-within-design or the inclusion 
of covariates have yet to be developed. Although in the present study the 
rejection algorithm was the most advantageous, a general MH algorithm 
may prove more flexible for such extended models due to its more direct 
approach. It is expected that extending the MGMG methods provided here 
to more complex models will exacerbate any issues regarding acceptance rates 
in different ways, so it remains to be seen which method will perform best 
after such an extension. 
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This study is also limited to the assumption that the data follows the 
von Mises distribution. Much of the literature on circular statistics assumes 
that circular data encountered in practice will follow this distribution, but 
this is not always the case. Groundwork for a general method for any kind 
of circular distribution was provided by Bhattacharya and Sengupta (2009) 
and employs importance sampling. Because importance sampling relies on 
defining an additional density to approximate normalizing constants, simpler 
methods such as the ones presented here are preferred where possible. 

Finally, the question remains whether the assumption of a common k 
across groups is a reasonable assumption. Circular data to be analyzed must 
be tested on this assumption. If it does not hold, methods presented in this 
paper may simply be applied to each group separately, each with J = 1. 

In sum, the intrinsic approach offers a promising and flexible approach to 
Bayesian analysis of circular data, and its extension to a model with J mul¬ 
tiple groups is an important first step towards developing flexible modeling 
of circular data in between-subjects designs. 

6. Remarks 

This work was supported by a Vidi grant awarded to I. Klugkist from 
NWO, the Dutch Organization for Scientific Research (NWO 452-12-010). 

R Code for application of the methods in this article, as well as supple¬ 
mental files, are available online at 

github.com/keesmulder/BayesianMultigroupCircularData. 
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